NSF PAR Search | NSF Public Access Repository

Our Collective Voices: The Social and Technical Values of a Grassroots Chinese Stuttered Speech Dataset

https://doi.org/10.1145/3715275.3732179

Li, Jingjin; Li, Qisheng; Gong, Rong; Wang, Lezhi; Wu, Shaomei (June 2025, ACM)

The lack of authentic stuttered speech data has significantly limited the development of stuttering friendly automatic speech recognition (ASR) models. In previous work, we collaborated with StammerTalk, a grassroots community of Chinese-speaking people who stutter (PWS), to collect the first stuttered speech dataset in Mandarin Chinese, containing 50 hours of conversational and command-recitation speech from 72 PWS. This work examines both the technical and social dimensions of the dataset. Through quantitative and qualitative analysis, as well as benchmarking and fine-tuning ASR models using the dataset, we demonstrate its technical value in capturing stuttered speech at an unprecedented scale and diversity – enabling better understanding and mitigation of fluency bias in ASR – and its social value in promoting self-advocacy and structural change for PWS in China. By foregrounding lived experiences of PWS in their own voices, we also see the potential of this dataset to normalize speech disfluencies and cultivate deeper empathy for stuttering within the AI research community.

Free, publicly-accessible full text available June 23, 2026

Search for: All records